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Abstract: A new technique for predicting species’ geographic distribution is described. The approach involves 3 
steps; setting up geographic base data: @)collecting and georeferencing distributional points; @)modeling ecological 
niches using the biodiversity species workshop implementation of the genetic algorithm for rule-set prediction (GARP). 
To illustrate these procedures, an example based on the Brown Eared Pheasant ( Crossoptilon mantchuricum ) is devel- 
oped. This technique constitutes a useful tool for assessing geographic distribution for questions of ecology, biogeogra- 


phy, systematics, and conservation biology. 
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Geographic information system applications in 
conservation biology have proceeded in two general 
directions. The first consists of developing algorithms 
to predict species’ geographic distribution. Various 
approaches are used to characterize species’ ecological 
niches in multivariate environmental space: geographic 
areas fitting these conditions are then taken as pre- 
dicted distribution, filling gaps created by uneven 
sampling (Nix, 1986: Walker et al . , 1991; Carpenter 
et al. , 1993; Sperduto et ai. .1996). The second di- 
rection is prioritizing areas for conservation, as exem- 
plified by Gap Analysis and other applications 
(e.g.Daniels et al., 1991; Russell-Smith et al., 
1992 ;Bojorquez-Tapia et al . , 1995; Harrison et al., 
1995; Kiester et al ., 1996), in which distributional 
information is integrated into strategies for reserve 
portfolio design. Integrated, these two efforts could 
provide a strong basis for educated decisions regarding 


geographic priorities for conservation (Peterson et 
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al . ,2000). 

In China .a large-scale program entitled the”Ter- 
restrial Vertebrate Wildlife Resources Survey” has 
been in process since 1996. This effort involves docu- 
menting distribution and population numbers for all 
endangered species.as well as for terrestrial vertebrate 
species of special interest economically or ecologically. 
A particularly difficult challenge has been determining 
species’ distributional areas. New advances in geo- 
graphic information systems and inferential computer 
software (e.g. Stockwell et ai. , 1991) , however, of- 
fer important opportunities to overcome these chal- 
lenges and take steps forward to understand better the 
distribution of Chinese terrestrial vertebrates. 

In this paper, we describe an important advance 
in the first sector of geographic information system 
applications to conservation biology: how species’ ge- 
ographic distribution can be predicted using the Ge- 
netic Algorithm for Rule-set Prediction (GARP } 
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modeling system. We work out a concrete example 
based on an endangered species of pheasant. the 
Brown Eared Pheasant (Crossoptilon mantchuricum ) 
endemic to China, and discuss potential applications of 


the approach. 
1 Methods 


The fundamental ecological niche of a species can 
be defined as the conjunction of ecological conditions 
within which it is able to maintain populations; as 
such it is defined in multidimensional ecological /envi- 
Tonmental space. The fundamental niche, the focus of 
modeling efforts, must be distinguished carefully from 
the realized niche (that part which is actually occu- 
pied),so as to maintain the modeling efforts focused 
on ecological dimensions important to a particular 
species. 

1.1 Several approaches have been used to approxi- 
mate species’ fundamental ecological niches. The sim- 
plest is BIOCLIM (Nix, 1986) , which involves tally- 
ing species’ occurrences in categories of each environ- 
mental dimension, trimming marginal portions of dis- 
tribution, and taking the niche as the conjunction of 
the trimmed ranges. Although easy to implement, and 
conceptually attractive. BIOCLIM suffers an odd re- 
duction in efficacy when many environmental dimen- 
sions are included—numbers of environmental combi- 
nations simply overwhelm most sampling protocols. 
BIOCLIM also suffers in general from high commis- 
sion error rates. 

1.2 A second class of approaches is based on logistic 
multiple regression, techniques aimed at predicting 
probability of “yes” versus “no” in the dependent 
variable. This idea combines well with the concept of 
physiological tolerances determining species’ presence 
along continuous climate dimensions, but does less 
well when categorical information (e.g. vegetation 
type and soil type) is also to be included. In effect, lo- 
gistic regression divides environmental space into two 
portions (“habitable” and “ uninhabitable”), an ap- 
proach that may be useful under some circumstances. 


Implementations of this approach have included im- 
portant improvements, such as relaxation of distribu- 
tional assumptions regarding errors in the regression. 
1.3 Finally, the Genetic Algorithm for Rule-set Pre- 
diction (GARP) includes both of the methods de- 
scribed above,as well as other set-based approaches in 
an iterative, artificial-intelligence-based approach 
(Stockwell et af. , 1992). Here, individual algorithms 
are used to produce component “rules” in the broader 
ruleset, and hence portions of the species’ distribu- 
tion may be determined as within or without the 
niche based on different algorithms. As such, GARP 
should represent a superset of the other approaches, 
and should always have greater predictive ability than 
any one of them. Extensive testing of GARP has indi- 
cated excellent predictive ability” and insensitivity to 
BIOCLIM'’s problems with environmental data densi- 
ty (Peterson et af. ,1999a). 

The GARP algorithm works in an iterative pro- 
cess of rule selection, evaluation, testing, and incorpo- 
ration or rejection. Occurrence data are divided into 
two halves; training data (for model building) and 
test data (for model evolution}. First, a method is 
chosen from a set of possible tools [e.g. logistic re- 
gression, BIOCLIM rules (Nix, 1986), etc. |. applied 
to the training data, and a rule developed. Predictive 
accuracy is evaluated based on 1 250 points resampled 
from the test data and 1 250 points sampled randomly 
from the study region as a whole, and accuracy calcu- 
lated as the sum of points correctly predicted as pre- 
sent or absent, divided by the total number of points 
in the map (Stockwell et al . , 1992). The change in 
predictive accuracy from one iterative to the next is 
used to evaluate whether a particular rule should be 
incorporated into the model. The algorithm runs 1 000 
iterations (“generations”) or until addition of rules 
has no appreciable effect on the accuracy measure 
(“fitness”}. Complete details and documentation of 
the algorithm are available at http: //biodi. sdsc. edu. 

The principal steps in the modeling approach de- 
veloped herein are assembly of base geographic data 
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layers, collection of species occurrence records (longi- 

tude and latitude}, building ecological niche models us- 

ing GARP, and predicting geographic distribution. The 

following details each step. 

1.3.1 Step one—Preparing base geographic cover- 
ages 

Necessary for ecological niche modeling is an in- 
formation base that includes environmental dimensions 
relevant to distributional and ecological limitation of 
the species in question. Important dimensions fre- 
quently include measures of temperature and precipita- 
tion (both averages and extremes), vegetation types, 
and elevation, among others. Additional environmental 
dimensions may be relevant to applications to particu- 
lar taxa or regions. Geographic data can exist in the 
form of continuous or categorical information. 

Many sources of these geographic coverages are 
available for building geographic base layers. An excel- 
lent soutce of varied information is ESRI’s (1996) 
ArcAtlas, a set of maps of more than 40 geographic 
themes at scales of 1:10 000 000 for Europe; 1:20 000 
000 for North and South America, Africa, and Antarc- 
ticasand 1:25 000 000 for Asia and Australia. Other 
excellent sources of geographic base layers include the 
EROS data center for satellite imagery, digital eleva- 
tion models,and other data types? . Many regional da- 
ta sources are available for particular geographic appli- 
cations. 

All geographic base data layers are converted to 
raster grid format. Grids must be exactly coincident a- 
mong coverages, including numbers of rows and 
columns, cell sizes, and grid locations. These operations 
can be achieved easily using the raster grid import/ex- 
port capabilities of ArcView (versions 3 and later). 

For Chinese applications, we have already devel- 
oped a basic set of geographic base data coverages ex- 
tending across all of China based on ESRI (1996). In- 
cluded are coverages summarizing low and high tem- 
perature and precipitation in January, July, and year- 


round; low and high solar radiation; snow cover; geolo- 
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gy; soils; geomorphology; and vegetation types. This 

set of base layers has been rasterized with pixes of 

21.930 km x 16.997 km, and is available for public 

use on the BSW facility”. 

1.3.2 
The second step in the process is aggregation sets 


Step two—Distributional data for species 


of points representing known occurrences cf the 
species in question. These data are available from a 
number of sources, including museum specimen tag da- 
ta, Monographic treatments that include locality infor- 
mation, and observational data sets (e.g.censuses, 
sightings, and compilations,as well as results of actual 
fieldwork}. A particularly useful source, though in 
prototype stage, is the species analyst,a distributed da- 
ta base including different biodiversity data bases 
wotldwide®. Textual locality descriptions must be 
converted into standard latitude-longitude coordinates 
based on gazetteers or by reference to maps. 
1.3.3 Step three—Modeling niches and predicting 
distribution 

Once base geographic coverages have been pre- 
pared and mounted, and species’ occurrence records 
assembled and georeferenced, analysis can begin. The 
BSW facility allows researchers to perform a variety of 
analyses on biodiversity data in real time on powerful 
computers at the San Diego Supercomputer Center via 
the World Wide Web. Latitude and longitude data are 
submitted from a web browser,and occurrence records 
can be mapped and manipulated, ecological niche mod- 
els developed, and models projected as predicted geo- 
graphic distribution. The BSW application is composed 
of four frames in a web browser, which are described 
and documented on the website. Most operations are 
menu-driven, making use convenient. A brief descrip- 
tion of important commands and operations follows. 

(Base data The base data option allows selec- 
tion of geographic base data for regions to be ana- 
lyzed. On the BSW, world data sets are available at 
coarse scales,and regional data at finer scales. Data for 


China are available at a scale of resolution of pixels of 
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17 km x 22 km for analysis. Users are given options of 
selecting particular regions, or selecting from world 
data by limiting geographic coordinates. 

@)Biological data Here, two options are avail- 
able. First (“upload”), species’ occurrence points can 
be entered as a list of longitude-latitude pairs separated 
by spaces, one line per occurrence point. A blank line 
between lines of data plots sets of points in different 
colors. Points can be entered manually, pasted in from 
other applications.or imported as ASCII files. Alterna- 
tively (“from database”), for some taxa and regions, 
distributional data are provided for retrieval on the 
website,and can be accessed using this option. 

@Modify data Here, particular geographic base 
layers and species’ occurrence points can be visua- 
lized, and can be included or excluded from analysis. 
Additional occurrence points can be added interactively 
by clicking on the map image in the screen frame. 

@ Make model 
BSW, presenting options for generating ecological 


This frame is central to the 


niche models from which geographic predictions are 
developed. Four alternative prediction algorithms may 
be selected including BIOCLIM (an approach based on 
frequency ranges on environmental dimensions) , E-ball 
(an approach based on distance measures), Logit (lo- 
gistic multiple regression) „and GARP. Herein, we fo- 
cus on GARP.which has proven to be the best of the 
four methods in a variety of tests. 

Several options are available for modifying and 
adjusting GARP models. The convergence criteria op- 
tion is a parameter of the genetic algorithm in GARP; 
decreasing this parameter refines the requirement for 
stability of the final model. improving results, but also 
increasing the processing time. The resembling type 
option controls how training data are prepared: speci- 
fying Ô constitutes the training data in proportion to 
the frequency of values in the area, whereas specifying 
1 populates the training data set in equal proportions 
of presences and absences (default). The frequency for 
dumping intermediate models option allows use of later 
options for reviewing the advance of the genetic algo- 
rithm by storing intermediate models. 

@)Model output The immediate output of the 
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modeling procedure is in the form of an image map, 
which allows visualization of model results. Although 
these image maps are useful, and can readily be ex- 
tracted into word-processing applications, several addi- 
tional options allow users to enter deeper into the 
modeling process. The most useful of these options are 
as follows. 

Accuracy allows model efficiency to be evaluated. 
Each rule is assigned an estimate of its predictive accu- 
racy; its posterior probability. The dimension bar in 
this frame is a list of threshold probabilities for inclu- 
sion: specifying 0.8 presents accuracy based on mules 
with expected accuracy of 0.8 or better: specifying 0 
evaluates all rules. Results are shown as a confusion 
matrix, with rows and columns representing the pre- 
dicted and actual values, respectively. The cells on the 
diagonal are correct predictions, while cells off the di- 
agonal represent incorrect predictions. 

The alternatives option produces a list of rules in 
the model, together with associated performance in- 
dices. Predictions resulting from each rule can be 
viewed in isolation. Choosing among alternative rules 
is one way to allow experts to exercise their specialized 
expertise and biological insight into problems. The 
combine rules option allows users to display combina- 
tions of model rules singly or in combination for fur- 
ther exploration of data. Finally, the overlays option 
allows users to alter the form of output. Scale and ex- 
tent of maps can be changed, data points added, and 
areas labeled. Perhaps most important are specifica- 
tions for map download and output formats: as a. gif 
file for image viewing, in postscript format for Mi- 
crosoft Word documents, and as ASCII raster grids for 
upload into GIS programs. 

Once ecological niche models are in hand, addi- 
tional possibilities open, including modeling distribu- 
tional changes with climate change and outputs visua- 
lized in virtual reality. 


2 Example 


The brown eared pheasant ( Crossoptilon mantc- 
huricum ) is a species endemic to northern China. Re- 


maining numbers are probably in the range of 1 000 — 
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5 000 individuals. The species is now restricted to five 
or six disjunct areas,of which four are reserves: three 
in Shanxi and one in Hebet;a fifth population was re- 
cently found close to Beijing (McGowan, 1994). The 
species suffered a major decline historically owing to 
widespread deforestation of mountains within its geo- 
graphic range. The species is classified as endangered 
under the revised IUCN Red List. Considering its mi- 
nuscule distributional area, small population size, and 
endemism within China, the Chinese government has 
already listed the species on the” First-class Protected 
Birds List.” 

To predict the distribution of this species, we ob- 
tained base geographic data coverages from ESRI 
(1996). In all,23 themes were included; snow cover 
(low and high); annual, January, and July precipitation 
(low and high); annual, January,and July temperature 
{low and high); land use; geography; solar radiation 
{low and high); human population; soils; vegetation; 
morphological structure ;and frost-free period. We con- 
verted themes into ASCII raster grids using ArcView, 
and sent them to the BSW facility in San Diego to be 
mounted as base coverages. Covering all of China, 
these base coverages are now publicly available for oth- 
er applications. 

Searching the scientific literature { Salvadori, 
1895; Peters, 1940; Cheng, 1979; Schauensee, 1984; 
Cheng, 1987) available to us,and with the kind colla- 
boration of BirdLife International, we obtained 13 u- 
nique occurrence points for the species. Textual geo- 
graphic references were then converted to latitude-lon- 
gitude pairs by direct consultation of maps, resulting in 
the following sets of coordinates: 111.42°E 37.75°N, 
111.48°E 37.90° N. 111.53°E 37.58°N, 111.57°E 
38.70°N, 112.00°E 38.93°N, 112.20°E 39.43°N, 
112.30°E 39.02°N, 112.50°E 37.92°N, 114.93°E 
40.83°N, 114.99°E 39.85°N, 115.00°E 40.00°N, 
115.33°E 40.00°N, and 116.50°E 40.25°N. 

To perform the actual modeling,on the BSW fa- 
cility, we chose China at base data, pasted in the geo- 
graphic coordinates of the occurrence points at biologi- 
cal data upload, eliminated coverages not desired for 
analysis in modify data, and selected GARP and con- 
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vergence criteria of 0.025 at make model. Once the 
model was built, the predicted map was displayed, and 
clicking at accuracy showed the predicting accuracy. 
Clicking overlays, we selected ARCgrid and again 
made a model, which produced an ASCII raster grid. 
This grid was copied, and saved in Word97 as an 
ASCII text file with line breaks. 

In ArcView GIS (version 3.1), we imported the 
ASCII grid. which permitted further visualization of 
the prediction, including overlay with political geo- 
graphic coverages. To further refine the prediction, we 
used published range maps to reduce the predicted dis- 
tribution to areas likely inhabited {other areas are at 
times predicted because conditions appropriate for pop- 
ulations can exist in areas outside of the actual geo- 
graphic distribution owing to historical factors such as 
limited colonization ability). ArcView’s tabulate areas 
function was used to count pixels and predicted pres- 
ence. 

Given an average pixel size of 373.74 km? in the 
Chinese coverages used, we predicted the raw, unad- 
justed distributional area of the Brown Eared Pheasant 
in China to 376 730 km? {1 008 celis) . However, taking 
into account the biogeographic limitations of the 
species {Peterson et a/.,1999b), its predicted distri- 
butional area was reduced to 174 910 km?(468 cells), 
which constitutes the species’ potential distributional 
area. Finally, using U.S.Geological Survey Land use/ 
and cover classifications based on AVHRR satellite im- 
agery available at the EROS Data Center Website, we 
reduced our prediction to natural habitats within the 
species’ potential distributional area, which totaled but 
11 960 km? (32 cells) in seven isolated sites, one in 
Beijing, two in Hebei Province, and four in Shanxi 
Province. 

While the most recent range estimates for the 
species suggest that it is now limited to about 13 600 
km? our predicted area is 11 960 km’. The two inde- 
pendent estimates are remarkably close; the difference 
could spring from the timing of the estimates and dif- 
ferent versions of vegetation maps used. We used an 
updated vegetation map to refine our predictions. Be- 
cause of deforestation, suitable habitat area for the 
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species is decreasing. 
3 Discussion 


The approach described above opens exciting new 
opportunities for studying geographic distribution of 
species. The approach is based on contrasts between 
characteristics of known occurrence points and those of 
the landscape of the region in general. The general ap- 
proach offers several important features; (1) tools are 
low-impact and fast, allowing interactive applications 
to be developed ;(2) data formats are open, and can be 
integrated with custom scripts, permitting develop- 
ment of applications that bridge the gaps between “da- 
ta,” “analysis, “and “visualization; "and (3) the mod- 
eling procedure is scale-independent, making possible 
applications at almost any spatial scale. 

Several sources of potential error do exist in the 
modeling procedure. First, distribution will often be 
overpredicted (i.e.predicted area too large) because of 
omission of critical limiting ecological dimensions from 
the analysis. Although distributional predictions appear 
to stabilize with relatively small numbers of ecological 
dimensions, such errors can be detected only through 
addition of more ecological dimensions to the analysis, 
or via procedures such as jackknifing of the inclusion 
of dimensions to detect instability (Peterson et al., 
19994). Additional error in predictions commonly 
springs from historical influences on distribution (Pe- 
terson et al., 1999b); for example, a given species 
May not occur on a mountain range not for lack of ap- 
propriate conditions, but rather because intervening 
lowland habitats prevented its ever having reached 
that range. An effective solution to this complication 
involves limiting predicted areas to those biotic regions 
or geographic units from which the species has actually 
been recorded, providing that sampling has been suffi- 
ciently intensive as to make absence of such records re- 


liable. Correction for these sources of error is thus fea- 
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sible if models and predictions are interpreted careful- 
ly, and with consideration of possible biases and confu- 
sions, yielding distributional predictions that are highly 
believable. 

Potential applications of this approach of ecologi- 
cal niche modeling and distributional prediction are 
numerous, including the following; 

@ Prediction of distribution of rare and poorly 
known species, including species so rare that localiza- 
tion of populations is difficult without inferential ap- 
proaches. 

@Prediction of areas of potential distribution for 
rare and endangered species, permitting design of 
strategies for reintroduction of species to natural ar- 
eas, 

@Evaluation of niche dimensions of single species 
or multiple species for use in evolutionary and compar- 
ative studies of evolutionary change in ecological di- 
mensions. 

@Distributional prediction for suites of species of 
interest for conservation concern, allowing develop- 
ment of strategies for protected areas systems or evalu- 
ation of potential for negative environmental impacts. 

@ Use of ecological niche models for synthetic 
models predicting distributional shifts under scenarios 
of global climate change, or species invasions of 
presently uninhabited areas. 

Further exploration and application by investiga- 
tors with diverse interests will undoubtedly add many 
more possible applications to the list. 
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